Cloud Infrastructure Troubleshooting Guide
Common Issues and Solutions
1. "Could not connect to registration service. Error: fetch failed"
**Symptom:** Signup page shows connection error when trying to register users.
**Root Cause:** Backend URL environment variables are defined in both the infrastructure config AND as secrets. When both exist, the **secret takes precedence** and may have outdated/incorrect values.
**Solution:**
# Remove duplicate secrets so config values take effect
atom-cli secrets unset PYTHON_BACKEND_URL BACKEND_URL
# Verify the correct values are now active
atom-cli secrets list | grep -i backend**Correct Configuration:**
[env]
PYTHON_BACKEND_URL = 'https://[tenant].atomagentos.com/api'
API_BASE_URL = 'https://[tenant].atomagentos.com/api'**Verification:**
# Test backend health
curl -s https://[tenant].atomagentos.com/api/v1/health
# Test signup endpoint
curl -s -X POST https://[tenant].atomagentos.com/api/auth/signup \
-H "Content-Type: application/json" \
-d '{"name":"Test User","email":"test@example.com","password":"testPass123","subdomain":"test-workspace"}'2. App Not Listening on Expected Port
**Symptom:** Warning after deployment: "The app is not listening on the expected address"
**Explanation:** This is expected behavior for the main app which runs Next.js on port 3000. The Python backend runs in a separate process on port 8000.
**Correct Process List:**
next-server (v | 0.0.0.0:3000 # Expected for Next.js
[ATOM Cloud Console Access Enabled]**If you see:** python | 0.0.0.0:8000 in the main app, that's incorrect - Python runs only in the API service.
3. API App Shows Stopped Machines
Check health
atom-cli status
4. Environment Variable Precedence
**Priority (highest to lowest):**
- Vault Secrets (
atom-cli secrets set) - Infrastructure configuration
- Build arguments
- System defaults
5. Unified Architecture
**Main App**:
- Next.js frontend + Python web/worker processes
- Runs on port 3000 (Next.js)
- Deployed from root
Dockerfile
**Commands:**
# Deploy app
atom-cli deploy
# Check status
atom-cli status
# View logs
atom-cli logs6. Database Migrations
**When to run:** After deploying atom-saas-api with new schema changes.
**Command:**
# From backend-saas directory
cd backend-saas
alembic upgrade head
# Verify current version
alembic current**Automatic:** Migrations run automatically via the release command in infrastructure.config.
7. Custom Domain Issues
**Symptom:** API calls fail on custom domain but work on default subdomain.
**Solution:**
# Verify custom domain is configured
atom-cli network list
atom-cli certificates list
# Check DNS propagation
dig [tenant].atomagentos.com**CORS Configuration:** Ensure backend allows custom domain in CORS settings.
Quick Health Checks
# Backend health
curl -s https://[tenant].atomagentos.com/api/health | jq .
# Frontend health
curl -s https://[tenant].atomagentos.com/health | jq .
# Check app status
atom-cli status
# Verify environment variables (no duplicate secrets)
atom-cli secrets list | grep -i backend || echo "✅ No backend URL secrets (correct)"Emergency Recovery
Rollback to Previous Deployment
# List deployments
atom-cli deployments
# Rollback to specific version
atom-cli deployments rollback <version>
# Or just redeploy last known good commit
git checkout <commit-hash>
atom-cli deployRestore from Backup
# Database (Neon Point-in-Time Restore)
# See Neon console or API for PITR
# Secrets backup (you should have these locally)
atom-cli secrets list > backup-secrets.txtMonitoring
# Real-time logs
atom-cli logs --lines 100
# Check resource usage
atom-cli status --resources
# Scale up if needed
atom-cli scale --cpu 2 --memory 2048Related Documentation
infrastructure.config- Main app configurationDockerfile- Main app builddocker-entrypoint.sh- Startup scriptbackend-saas/scripts/run_migrations.sh- Migration script
Version History
- **2026-02-23:** Documented "fetch failed" signup issue caused by duplicate environment variable secrets